Towards a new Approach for Arabic root extraction: Exploit relations between the word letters and their placement in the word for Arabic root extraction

نویسنده

  • Fatma Abu Hawas
چکیده

This paper presents a new root-extraction approach for Arabic words. The approach tries to assign for Arabic words a unique root without relying on a database of word roots, a list of word patterns or a list of all the prefixes and the suffixes of the Arabic words. Unlike most of Arabic rule-based stemmers, it tries to predict the root-letters positions one by one based on some rules and relations among the word letters and their placement in the word. This paper focuses on two parts of the approach. The first one introduces some rules to distinguish between the Arabic definite article (È@āl ) and the permanent component (È@āl ) that may found in any Arabic word. The second one classifies Arabic letters in to groups according to their positions in the word. The proposed approach is a system composed of several modules used to extract the word root. The approach has been evaluated using the Holy Quran words. The evaluation results show a promising root extraction algorithm.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rule-based Approach for Arabic Root Extraction: New Rules to Directly Extract Roots of Arabic Words

Extracting word roots in Arabic language is very problematic due to the specific morphological and structural changes in the language. To address this problem, several techniques have been proposed. This paper continues the problem of identifying and exploiting relationship amongst Arabic letters for Arabic root extraction begun in [1]. Eight different rules that detect the root letters accordi...

متن کامل

Extracting the roots of Arabic words without removing affixes

Most research in Arabic roots extraction focuses on removing affixes from Arabic words. This process adds processing overhead and may remove non-affix letters, which leads to the extraction of incorrect roots. This paper advises a new approach to dealing with this issue by introducing a new algorithm for extracting Arabic words’ roots. The proposed algorithm, which is called the Word Substring ...

متن کامل

Representation of Arabic Words - An Approach Towards Probabilistic Root-Pattern Relationships

In the traditional Arabic NLP a root-pattern relationship has generally been considered as a simple relationship, whereas the potential aspect of considering it as a statistical measure has extensively been neglected and even never formally considered. This paper attempts therefore to explore some issues involved in considering the classical phenomenon of Arabic root-pattern relationships as pr...

متن کامل

A Markovian approach for arabic root extraction

In this paper, we present an Arabic morphological analysis system that assigns, for each word of an unvoweled Arabic sentence, a unique root depending on the context. The proposed system is composed of two modules. The first one consists of an analysis out of context. In this module, we segment each word of the sentence into its elementary morphological units in order to identify its possible r...

متن کامل

An Approach for Arabic Root Generating and Lexicon Development

This paper presents a novel approach for Arabic root generation and lexicon development. The approach provides three algorithms; in the first algorithm Arabic word root is generated using the concept of permutation and combination, the root generator algorithm generates roots by applying permutations to the Arabic alphabetic letters. Then, the second algorithm is used for developing difference ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Science (AGH)

دوره 14  شماره 

صفحات  -

تاریخ انتشار 2013